##Data Download
First the full logTPM matrix and clinical data should be downloaded via the API. We will use here a local file. The clinical data should match the column names in the table. We should also prepere an API for the clinical table.
We will need also to tidy our data matrix. The API should return a ready-to-go table.
log.tpm <- data.table::fread(local.tpm.path, data.table = F) %>%
tibble::column_to_rownames('Gene')
log.tpm$V1 <- NULL
log.tpm <- round(log.tpm,3)
log.tpm[1:5,1:5]
## CTG-0009 CTG-0011 CTG-0012 CTG-0017 CTG-0018
## A1BG 1.735 0.308 5.203 1.845 4.325
## A1CF 0.000 2.104 0.137 0.075 0.060
## A2M 8.567 0.293 9.779 2.557 0.043
## A2ML1 0.773 1.150 4.257 2.184 5.532
## A4GALT 2.572 2.708 2.205 1.256 3.796
Before running a tSNE we should select only the most 5000 variable genes. This will be done using median absolute deviation and select the top n genes, we plot the genes means to examine the distribution of the genes.
mad.genes <-
apply(log.tpm,1,mad) %>%
sort(decreasing = T) %>%
names() %>%
head(n=5000)
plot_ly(alpha = 0.5) %>% #bigger alpha = more opaque
add_histogram( #adds first histogram
x = ~rowMeans(log.tpm)) %>% #choose numeric var. for first histogram
add_histogram( #add second histogram
x = ~rowMeans(log.tpm[mad.genes,])) %>% #choose numeric var. for second histogram
layout(barmode = "overlay")
tsne.results <-
Rtsne::Rtsne(t(log.tpm[mad.genes,]),
dims = 2,
perplexity = 25,
check_duplicates = F)
tsne.coord <- data.frame(tsne.results$Y)
colnames(tsne.coord) <- c('tsne1','tsne2')
tsne.coord$Model <- colnames(log.tpm)
tsne.coord$model_id <- as.numeric(gsub('(CTG-)(\\d{4})','\\2',tsne.coord$Model))
tsne.coord <- merge(tsne.coord,clinical, by.x='model_id',by.y = 'model_id',all.x=T)
Follow this we will calculate tSNE using the Rtsne package over two dimentions and perplexity = 25, this parameter can be tune later.
note: If these are not installed these should first be installed inside the workstation.
We merge our results with the indication from the clinical data. Plotly is used here to plot the tSNE results.
## Warning in RColorBrewer::brewer.pal(N, "Set2"): n too large, allowed maximum for palette Set2 is 8
## Returning the palette you asked for with that many colors
## Warning in RColorBrewer::brewer.pal(N, "Set2"): n too large, allowed maximum for palette Set2 is 8
## Returning the palette you asked for with that many colors
This is a case of downloading specific genes and models. We have picked here all of TGDB’s Prostate models with a specific genes. We use pheatmap to plot the heatmap.
requested.genes <- c('AR','KLK3','FOLH1','NKX3-1','MYCN','NKX2-1',
'DLL3','DPF1','SOX2','SYP','CHGA','CHGB','SPINK1',
'HNF4G','HNF1A','MUC5AC','MUC5B','KRT20','PDX1',
'EED','EZH2','SUZ12','SMARCC1','PBRM1','SMARCA1',
'SMARCC2','ACTL6B','ARID1A','ARID1B','SMARCA4',
'SMARCD2','SMARCD1','SMARCD3','ACTL6A','SMARCB1',
'KDM6A','RB1','TP53','CDKN2A')
requested.models <- c('CTG-3581','CTG-3570','CTG-3610','CTG-3537',
'CTG-3421','CTG-2430','CTG-3167','CTG-3337',
'CTG-2440','CTG-2962','CTG-2427','CTG-2428',
'CTG-2429','CTG-2441','CTG-3366')
brks <- seq(0,8,by = .05)
colors <- colorRampPalette(rev(c('red','orange','white','#056DF7','blue')))(length(brks))
pheatmap::pheatmap(log.tpm[requested.genes,rev(requested.models)],
breaks = brks,
color = colors,
cluster_rows = F,
cluster_cols = F,
cutree_cols = 3,
cutree_rows = 3
)